39 research outputs found

    Portable compiler optimisation across embedded programs and microarchitectures using machine learning

    Get PDF
    Building an optimising compiler is a difficult and time consuming task which must be repeated for each generation of a microprocessor. As the underlying microarchitecture changes from one generation to the next, the compiler must be retuned to optimise specifically for that new system. It may take several releases of the compiler to effectively exploit a processor’s performance potential, by which time a new generation has appeared and the process starts again. We address this challenge by developing a portable optimising compiler. Our approach employs machine learning to automatically learn the best optimisations to apply for any new program on a new microarchitectural configuration. It achieves this by learning a model off-line which maps a microarchitecture description plus the hardware counters from a single run of the program to the best compiler optimisation passes. Our compiler gains 67 % of the maximum speedup obtainable by an iterative compiler search using 1000 evaluations. We obtain, on average, a 1.16x speedup over the highest default optimisation level across an entire microarchitecture configuration space, achieving a 4.3x speedup in the best case. We demonstrate the robustness of this technique by applying it to an extended microarchitectural space where we achieve comparable performance

    Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

    Get PDF
    General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51Ă— and 4.20Ă— (143Ă— and 67Ă—) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10Ă— speedups over two state-of-the-art automatic GPU code generators

    Patellar tendon properties distinguish elite from non-elite soccer players and are related to peak horizontal but not vertical power

    Get PDF
    Purpose: The aims of our study were to investigate differences in tendon properties between elite and non-elite soccer players, and to establish whether tendon properties were related to power assessed during unilateral jumps in different directions. Methods: Elite (n=16; age, 18.1 ± 1.0yrs) and non-elite (n=13; age, 22.3 ± 2.7yrs) soccer players performed three repetitions of each type (unilateral vertical, unilateral horizontal-forward and unilateral medial) of countermovement jump (CMJ) on a force plate. Patellar tendon (PT) cross-sectional area (CSA), elongation, stiffness and Young’s modulus (measured at the highest common force interval) were assessed with ultrasonography and isokinetic dynamometry. Results: Elite soccer players demonstrated greater PT elongation (6.83±1.87 vs. 4.92±1.88 mm, P=0.011) and strain (11.73±3.25 vs. 8.38±3.06 %, P=0.009) than non-elite. Projectile range and peak horizontal power during unilateral horizontal-forward CMJ correlated positively with tendon elongation (r=0.657 and 0.693, P<0.001) but inversely with Young’s modulus (r=-0.376 and -0.402, P=0.044 and 0.031). Peak medial power during unilateral medial CMJ correlated positively with tendon elongation (r=0.658, P=<0.001) but inversely with tendon stiffness (r=-0.368, P=0.050). No tendon property correlated with unilateral vertical CMJ performance (r≤0.168; P≥0.204). Conclusions: Patellar tendon strain was greater in elite vs. non-elite soccer players and can therefore be considered an indicator of elite soccer playing status. Moreover, a more compliant patellar tendon appears to facilitate unilateral horizontal-forward and medial, but not vertical CMJ performance in soccer players. These findings should be considered when prescribing the detail of talent selection and development protocols related to direction-specific power in elite soccer players

    The Heidelberg Milestones Communication Approach (MCA) for patients with prognosis &lt;12 months: protocol for a mixed-methods study including a randomized controlled trial

    Get PDF
    Background: The care needs of patients with a limited prognosis (&lt;12 months median) are complex and dynamic. Patients and caregivers must cope with many challenges, including physical symptoms and disabilities, uncertainty. and compromised self-efficacy. Healthcare is often characterized by disruptions in the transition between healthcare providers. The Milestones Communication Approach (MCA) is a structured, proactive, interprofessional concept that involves physicians and nurses and is aimed at providing coherent care across the disease trajectory. This study aims to evaluate these aspects of MCA: (1) the training of healthcare professionals, (2) implementation context and outcomes, (3) patient outcomes, and (4) effects on interprofessional collaboration. Methods/design: A multiphase mixed-methods design will be used for the study. A total of 100 patients and 120 healthcare professionals in a specialized oncology hospital will be involved. The training outcomes will be documented using a questionnaire. Implementation context and outcomes will be explored through semi-structured interviews and written questionnaires with healthcare professionals and with the training participants and through a content analysis of patient files. Patient outcomes will be assessed in a pragmatic non-blinded randomized controlled trial and in qualitative interviews with patients and caregivers. Trial outcomes are supportive care needs (SCNS-SF34-G), quality of life (SeiQol and Fact-L), depression and anxiety symptoms (PHQ-4), and distress (Distress Thermometer). Qualitative semi-structured interviews on patients’ views will focus on shared decision-making, communication needs, feeling empathy, and further utilization of healthcare services. Interprofessional collaboration will be explored using the UWE-IP-D before the implementation of MCA (t0) and after 3 (t1), 9 (t2), and 12 (t3) months. Discussion: Using guideline-concordant early palliative care, MCA aims to foster patient-centered communication with shared decision-making and facilitation of advance care planning including end-of-life decisions, thus increasing patient quality of life and decreasing aggressive medical care at the end of life. It is assumed that the communication skills training and interprofessional coaching will improve the communication behavior of healthcare providers and influence team communications and team processes. Trial registration German Clinical Trials Register, DRKS00013649 and DRKS00013469. Registered on 22 December 2017

    Automatic Tuning of Inlining Heuristics for Java JIT Compilation

    No full text
    Abstract. Inlining improves the performance of programs by reducing the overhead of method invocation and increasing the opportunities for compiler optimization. Incorrect inlining decisions, however, can degrade both the running and compilation time of a program. This is especially important for a dynamically compiled language such as Java. Therefore, the heuristics that control inlining must be carefully tuned to achieve a good balance between these two costs to reduce overall total execution time. This paper develops a genetic algorithms based approach to automatically tune a dynamic compiler’s internal inlining heuristic. We evaluate our technique within the Jikes RVM [1] compiler and show a 17% average reduction in total execution time on the SPECjvm98 benchmark suite on a Pentium-4. When applied to the DaCapo benchmark suite, our approach reduces total execution time by 37%, outperforming all existing techniques.

    Automatic Tuning of Inlining Heuristics

    No full text
    Inlining improves the performance of programs by reducing the overhead of method invocation and increasing the opportunities for compiler optimization. Incorrect inlining decisions, however, can degrade both the running and compilation time of a program. This is especially important for a dynamically compiled language such as Java. Therefore, the heuristics that control inlining must be carefully tuned to achieve a good balance between these two costs to reduce overall total execution time. This paper develops a genetic algorithms based approach to automatically tune a dynamic compiler&apos;s internal inlining heuristic. We evaluate our technique within the Jikes RVM [1] compiler and show a 17% average reduction in total execution time on the SPECjvm98 benchmark suite on a Pentium-4. When applied to the DaCapo benchmark suite, our approach reduces total execution time by 37%, outperforming all existing techniques

    Micro architectural Design Space Exploration Using An Architecture-Centric Approach

    No full text
    The microarchitectural design space of a new processor is too large for an architect to evaluate in its entirety. Even with the use of statistical simulation, evaluation of a single configuration can take excessive time due to the need to run a set of benchmarks with realistic workloads. This paper proposes a novel machine learning model that can quickly and accurately predict the performance and energy consumption of any set of programs on any microarchitectural configuration. This architecture-centric approach uses prior knowledge from off-line training and applies it across benchmarks. This allows our model to predict the performance of any new program across the entire microarchitecture configuration space with just 32 further simulations. We compare our approach to a state-of-the-art programspecific predictor and show that we significantly reduce prediction error. We reduce the average error when predicting performance from 24 % to just 7 % and increase the correlation coefficient from 0.55 to 0.95. We then show that this predictor can be used to guide the search of the design space, selecting the best configuration for energy-delay in just 3 further simulations, reducing it to 0.85. We also evaluate the cost of off-line learning and show that we can still achieve a high level of accuracy when using just 5 benchmarks to train. Finally, we analyse our design space and show how different microarchitectural parameters can affect the cycles, energy and energy-delay of the architectural configurations. 1
    corecore